Two New Computational Methods for Universal DNA Barcoding: A Benchmark Using Barcode Sequences of Bacteria, Archaea, Animals, Fungi, and Land Plants
نویسندگان
چکیده
Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used "1-nearest-neighbor" (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research.
منابع مشابه
Correction: Two New Computational Methods for Universal DNA Barcoding: A Benchmark Using Barcode Sequences of Bacteria, Archaea, Animals, Fungi, and Land Plants
In the third sentence of the first paragraph of sub-subsection “Query-Centric Auto-k-NearestNeighbor (QCauto) Method”, within the Materials and Methods section, the variable of expression is incorrect. The correct sentence should read: A BLAST-search of reference sequences similar to the query was then performed, and subsequently, all the neighborhood sequences (Ns) whose distance to the query ...
متن کاملDNA Barcoding: a new tool with wide array of applications
DNA barcoding is a new term introduced in to scientific literatures by Hebert and coworkers almost a decade ago. The concept of barcoding alone is well-known to the public: a series of black bars printed on many commercial products (Universal Product Code), which are used to distinguish different products. Advances made in molecular biology and molecular techniques late 20th century e.g. sequen...
متن کاملPhylogenetic Assessment of Some Species of Crocus Genus Using DNA Barcoding
DNA barcoding is a simple method for the identification of any species using a short genetic sequence from a standard genome section. The present study aimed at examining the nuclear and chloroplast diversity as well as the phylogenetic relationships of eight species of saffron including four spring-flowering and five autumn-flowering species from different parts of Iran, using the nuclear barc...
متن کاملMedical Archives and Manuscripts News, 2005
Background. A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal ...
متن کاملResearch Article: Molecular genetic divergence of five genera of cypriniform fish in Iran assessed by DNA barcoding
The present study represents a comprehensive molecular assessment of some family of freshwater fishes in Iran. We analyzed cytochrome oxidase I (COI) sequences for five genus of cypriniform fishes from Iran. The present investigation provides data on genetic structure of some species of Nemachilidae including Paraschistura bampurensis, Oxynoemacheilus kiabii and Turcinemacheilus saadii and Leuc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 8 شماره
صفحات -
تاریخ انتشار 2013